Prosper Loan Data Exploration by Masa Hayakawa

This report explores Prosper Loan dataset that contain loan information for approximately 114,000 loan records.

Data Wrangling Section

Before exploring analysis, I would like to deal with any missing values.

Above chart is showing there are some missing values with ProsperRating..numeric, ProsperRating..Alpha, ProsperScore, and DebtToIncomeRatio, and EmploymentStatus.

Univariate Plots Section

First of all, in this section, I would like to perform some preliminary exploration of the dataset.

Original Prosper loan dataset contains approximately 114,000 observations and 81 variables. I have created a subset from original data so that I can only focus on already closed loans. Also, I have picked up only necessary variables. Thus, the dataset in this report includes 55,084 observations with 16 variables as follows.

##  [1] 640 480 700 760 620 680 660 740 520 720 600 580 540 820 560 780 500
## [18] 840 860   0 800 460 880 440 420 360
## [1] 55084    30

Structure of the dataset is listed as follows:

## 'data.frame':    55084 obs. of  30 variables:
##  $ LoanOriginalAmount         : int  9425 3001 1000 4000 10000 3000 2000 4000 4000 10000 ...
##  $ BorrowerRate               : num  0.158 0.275 0.133 0.318 0.125 ...
##  $ StatedMonthlyIncome        : num  3083 2083 1667 5500 5833 ...
##  $ DebtToIncomeRatio          : num  0.17 0.06 0.27 0.49 0.12 0.09 0.39 0.11 0.26 0.11 ...
##  $ MonthlyLoanPayment         : num  330.4 123.3 33.8 173.7 334.5 ...
##  $ ProsperScore               : num  0 0 0 5 0 0 5 3 9 9 ...
##  $ IncomeRange                : Factor w/ 8 levels "$0","$1-24,999",..: 4 7 2 5 7 4 4 4 6 4 ...
##  $ LoanStatus                 : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 3 3 5 2 3 5 3 3 3 ...
##  $ EmploymentStatus           : Factor w/ 8 levels "Employed","Full-time",..: 8 3 2 5 3 2 1 1 2 1 ...
##  $ IsBorrowerHomeowner        : Factor w/ 2 levels "False","True": 2 1 1 2 2 1 1 1 2 1 ...
##  $ ProsperRating..Alpha.      : chr  "NA" "NA" "NA" "HR" ...
##  $ LoanOriginationDate        : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 260 451 1399 160 488 1655 1609 941 1348 ...
##  $ ListingCategory..numeric.  : int  0 0 0 13 0 0 15 1 1 20 ...
##  $ Term                       : int  36 36 36 36 36 36 36 60 36 36 ...
##  $ ProsperRating..numeric.    : int  NA NA NA 1 NA NA 3 2 6 6 ...
##  $ EmploymentStatusDuration   : num  2 0 19 121 0 36 3 25 10 44 ...
##  $ CurrentCreditLines         : num  5 0 2 10 0 4 7 7 16 4 ...
##  $ OpenRevolvingAccounts      : int  1 0 1 5 16 3 5 6 11 4 ...
##  $ OpenRevolvingMonthlyPayment: num  24 0 40 300 346 15 245 95 294 105 ...
##  $ TotalInquiries             : num  3 1 5 1 17 8 0 3 8 0 ...
##  $ CurrentDelinquencies       : num  2 1 3 0 2 1 0 0 0 1 ...
##  $ RevolvingCreditBalance     : num  0 0 1220 9103 0 ...
##  $ BankcardUtilization        : num  0 0 0.32 0.97 0 0.08 0.84 0.3 0.09 0.13 ...
##  $ AvailableBankcardCredit    : num  1500 0 2580 178 0 ...
##  $ IncomeVerifiable           : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ CreditScoreRangeLower      : num  640 480 640 700 760 620 680 660 700 740 ...
##  $ LoanOriginalAmountRange    : Factor w/ 20 levels "$0 - $999","$1,000 - $1,999",..: 9 3 1 4 10 3 2 4 4 10 ...
##  $ StatedMonthlyIncomeRange   : Factor w/ 20 levels "$0 - $999","$1,000 - $1,999",..: 3 2 1 5 5 3 2 3 6 3 ...
##  $ IsDefaulted                : chr  "Completed" "Completed" "Completed" "Defaulted/Charge-off" ...
##  $ MonthlyIncomePaymentDiff   : num  0.893 0.941 0.98 0.968 0.943 ...

Summary of each variables are as follows:

##  LoanOriginalAmount  BorrowerRate    StatedMonthlyIncome DebtToIncomeRatio
##  Min.   : 1000      Min.   :0.0000   Min.   :     0      Min.   : 0.0000  
##  1st Qu.: 2600      1st Qu.:0.1350   1st Qu.:  2811      1st Qu.: 0.1000  
##  Median : 4500      Median :0.1945   Median :  4167      Median : 0.1900  
##  Mean   : 6262      Mean   :0.2004   Mean   :  5055      Mean   : 0.2676  
##  3rd Qu.: 8000      3rd Qu.:0.2699   3rd Qu.:  6250      3rd Qu.: 0.2900  
##  Max.   :35000      Max.   :0.4975   Max.   :618548      Max.   :10.0100  
##                                                                           
##  MonthlyLoanPayment  ProsperScore            IncomeRange   
##  Min.   :   0.00    Min.   : 0.000   $25,000-49,999:16343  
##  1st Qu.:  98.29    1st Qu.: 0.000   $50,000-74,999:12789  
##  Median : 172.60    Median : 0.000   Not displayed : 7737  
##  Mean   : 223.73    Mean   : 2.958   $75,000-99,999: 6442  
##  3rd Qu.: 300.47    3rd Qu.: 6.000   $100,000+     : 6064  
##  Max.   :2251.51    Max.   :11.000   $1-24,999     : 4571  
##                                      (Other)       : 1138  
##                   LoanStatus         EmploymentStatus IsBorrowerHomeowner
##  Completed             :38074   Full-time    :24957   False:29199        
##  Chargedoff            :11992   Employed     :16491   True :25885        
##  Defaulted             : 5018   Not available: 5346                      
##  Cancelled             :    0   Self-employed: 2926                      
##  Current               :    0   Part-time    : 1056                      
##  FinalPaymentInProgress:    0   (Other)      : 2056                      
##  (Other)               :    0   NA's         : 2252                      
##  ProsperRating..Alpha.          LoanOriginationDate
##  Length:55084          2007-01-17 00:00:00:  137   
##  Class :character      2011-08-31 00:00:00:  133   
##  Mode  :character      2008-04-30 00:00:00:  128   
##                        2008-05-28 00:00:00:  126   
##                        2011-09-30 00:00:00:  121   
##                        2011-11-30 00:00:00:  120   
##                        (Other)            :54319   
##  ListingCategory..numeric.      Term       ProsperRating..numeric.
##  Min.   : 0.000            Min.   :12.00   Min.   :1.000          
##  1st Qu.: 0.000            1st Qu.:36.00   1st Qu.:2.000          
##  Median : 1.000            Median :36.00   Median :3.000          
##  Mean   : 2.408            Mean   :36.94   Mean   :3.663          
##  3rd Qu.: 3.000            3rd Qu.:36.00   3rd Qu.:5.000          
##  Max.   :20.000            Max.   :60.00   Max.   :7.000          
##                                            NA's   :29079          
##  EmploymentStatusDuration CurrentCreditLines OpenRevolvingAccounts
##  Min.   :  0.00           Min.   : 0.000     Min.   : 0.000       
##  1st Qu.:  9.00           1st Qu.: 4.000     1st Qu.: 3.000       
##  Median : 41.00           Median : 8.000     Median : 5.000       
##  Mean   : 69.72           Mean   : 8.248     Mean   : 6.078       
##  3rd Qu.: 99.00           3rd Qu.:12.000     3rd Qu.: 8.000       
##  Max.   :755.00           Max.   :59.000     Max.   :51.000       
##                                                                   
##  OpenRevolvingMonthlyPayment TotalInquiries    CurrentDelinquencies
##  Min.   :    0.0             Min.   :  0.000   Min.   : 0.0000     
##  1st Qu.:   60.0             1st Qu.:  2.000   1st Qu.: 0.0000     
##  Median :  183.0             Median :  5.000   Median : 0.0000     
##  Mean   :  325.3             Mean   :  7.016   Mean   : 0.8948     
##  3rd Qu.:  418.0             3rd Qu.:  9.000   3rd Qu.: 1.0000     
##  Max.   :14985.0             Max.   :379.000   Max.   :83.0000     
##                                                                    
##  RevolvingCreditBalance BankcardUtilization AvailableBankcardCredit
##  Min.   :      0        Min.   :0.0000      Min.   :     0         
##  1st Qu.:    429        1st Qu.:0.0400      1st Qu.:    92         
##  Median :   4368        Median :0.4600      Median :  2019         
##  Mean   :  13525        Mean   :0.4573      Mean   :  9360         
##  3rd Qu.:  13750        3rd Qu.:0.8100      3rd Qu.:  9791         
##  Max.   :1435667        Max.   :5.9500      Max.   :646285         
##                                                                    
##  IncomeVerifiable CreditScoreRangeLower    LoanOriginalAmountRange
##  False: 4300      Min.   :  0.0         $1,000 - $1,999: 7933     
##  True :50784      1st Qu.:620.0         $2,000 - $2,999: 7118     
##                   Median :680.0         $3,000 - $3,999: 6858     
##                   Mean   :664.6         $0 - $999      : 6468     
##                   3rd Qu.:720.0         $4,000 - $4,999: 5612     
##                   Max.   :880.0         $6,000 - $6,999: 3959     
##                                         (Other)        :17136     
##     StatedMonthlyIncomeRange IsDefaulted        MonthlyIncomePaymentDiff
##  $2,000 - $2,999: 9434       Length:55084       Min.   :-12570.730      
##  $1,000 - $1,999: 9379       Class :character   1st Qu.:     0.926      
##  $3,000 - $3,999: 8103       Mode  :character   Median :     0.958      
##  $4,000 - $4,999: 6601                          Mean   :   -15.523      
##  $0 - $999      : 6350                          3rd Qu.:     0.977      
##  $5,000 - $5,999: 4594                          Max.   :     1.000      
##  (Other)        :10623

Now, I would like to look into the dataset. Firstly, I would like to explore some of the univariate plots.

First histogram is showing right skewed distribution. I have transformed the long tail data to better understand the distriution. The transformed loan amount distribution appears to have unimodal with peaks between 5,000 - 7,000.

I would like to look at other numerical variables as well.

Histogram by Borrwer rate is showing slightly right skewed unimodal distribution with its peak around 0.15.

Histogram by loan term only shows 12, 36 and 60 months. Why are they so cloear cut? According to Prosper website[1], they seem to be offering only fixed term with either 3 or 5 years.

[1] https://www.prosper.com/ Prosper website

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    2811    4167    5055    6250  618548

Next, I would like to look across categorical variables.

Firstly let’s look at the distribution by Income Range.

I was expecting to see more right skewed distribution, in ohter words, tendecy like borrower with lower income tend to borrow more. However, the plot shows unimodal distribution with peak around $25,000 and $50,000.

## [1] 0.8247404

This plot shows distribution by Employment status. 82.4 % of the borrowers are employed in some form, i.e. either ‘Employed’,‘Full-time’, ‘Part-time’, or ‘Self-employed’.

## [1] 0.4699187

Above plot showing distribution by whether or not the borrower is home owner. 46.9% of the borrowers are home owners.

Next, let’s have alook at distribution by risk indicators.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   2.958   6.000  11.000

Above plot by Prosper score is showing slightly left skewed unimodal distribution with its peak around 8 with median of 6 and most of them are between 5 and 8.

This plot shows distribution by Prosper rating. Borrowers seem to be distributed almost equally across each ratings, except ‘D’ has the prominently high density.

Also listed the summary for each ratings as below.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    2600    4500    6262    8000   35000
## loandata_completed$ProsperRating..Alpha.: A
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    3500    6000    8041   11000   35000 
## -------------------------------------------------------- 
## loandata_completed$ProsperRating..Alpha.: AA
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    3000    6500    8178   12000   25000 
## -------------------------------------------------------- 
## loandata_completed$ProsperRating..Alpha.: B
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    7500    8559   12500   35000 
## -------------------------------------------------------- 
## loandata_completed$ProsperRating..Alpha.: C
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    3000    6000    7547   12000   25000 
## -------------------------------------------------------- 
## loandata_completed$ProsperRating..Alpha.: D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    3000    5000    6033    8000   15000 
## -------------------------------------------------------- 
## loandata_completed$ProsperRating..Alpha.: E
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    3000    4000    4300    5500   15900 
## -------------------------------------------------------- 
## loandata_completed$ProsperRating..Alpha.: HR
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    2500    4000    3397    4000   16800 
## -------------------------------------------------------- 
## loandata_completed$ProsperRating..Alpha.: NA
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    2500    4500    6160    7920   25000

Finally, I would like to see what are the proportion of defaulted/charged-off loans.

The first plot shows the proportion of completed loans, defaulted loans, and charged-off loans. In the second plot, I just combined defaulted loans and charge-off loans togerther, to compare with the completed loans.

## [1] 0.6911989

Almost 70 % of closed loans are completed without being defaulted/charge-off.

Univariate Analysis

What is the structure of your dataset?

Original Prosperloan dataset consists of 81 variables, with approximately 114,000 observations. I have created a subset of 55,084 observations that have closed status and also only use 16 variables.

What is/are the main feature(s) of interest in your dataset?

The main features of interest are Categorical factor variables listed as below:

  • IncomeRange: $0, $1-24,999, $25,000-49,999, $50,000-74,999, $75,000-99,999 , $100,000+, ‘Not displayed’, and ‘Not employed’
  • LoanStatus: Cancelled, Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, PastDue
  • EmploymentStatus: Employed, Full-time, Not available, Not employed, Other, Part-time, Retired, and Self-employed
  • IsBorrowerHomeowner: TRUE, FALSE
  • ProsperRating..Alpha.: (Higher -> Lower) AA, A, B, C, D, E, HR
  • ProsperScore: Ranges from 1-10, with 10 being the best

What other features in the dataset do you think will help support your  investigation into your feature(s) of interest?

Other variables such as BorrowerRate, StatedMonthlyIncome, DebtToIncomeRatio, MonthlyLoanPayment, and Term, likely contribute to further analysis.

Did you create any new variables from existing variables in the dataset?

  • IsDefaulted: ‘Defaulted/Charge-off’: defaulted or charged off, ‘Completed’: Completed, determined based on LoanStatus.

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

  • LoanOriginalAmount: I have log-transformed the right skewed distribution in the second plot of this section.
  • StatedMonthlyIncome: Histogram eliminated some outliers such as monthly income of 618,548 that appears to be unrealistic for a personal loan borrower.

Bivariate Plots Section

Correlations between variables

Firstly I have plotted a correlation matrix as below.

  • Stated Monthly Income seems to have quite strong correlation with Prosper Rating(0.0697) and Prosper Score(0.0558).
  • Borrower Rate also seems to have strong correlation with Stated Monthly Income(-0.0691).
  • There also seems to be some correlation between loan amount and risk indicators such as Prosper rating and Prosper Score.

Now, I would like to look into the relation among individual variables more closely.

Firstly, I would like to see the relation between Prosper Score and Monthly Income, and also Prosper rating and monthly income, that are showing the strongest correlation in above chart.

  • The first plot shows the trend that the higher the bowroower’s income, the better rating the borrowers get.
  • Similar trend can be observed with Prosper score in the second plot, though it is not as drastic as wtih the rating.

Next, I would like to see how borrowers rate differs across Prosper rating and Prosper score.

  • Seemingly, though each plot showing oppsite direction, both plots by rating and by score are showing predictable distribution.
  • The higher the risk, the higher loan rate the borrowers are charged.

Next, I would like to see the correlation between Borrower rate and Monthly Income that also showing quite strong correlation of -0.0691 in the first correlation matrix.

The lm line showing the negative trend between Stated Monthly Income and Borrower rate. The hihger the monthly income, the lower the rate.

Next, I would like to see how the original loan amount varies by Prosper ratings.

  • Loan amount seem to be ranged higher with better rating borrowers. With higher ratings between AA to C, mdeians are more than 6,000.
  • In contrast, with lower ratings between D, E and HR, median are 5,000 or less.

It becomes more obvious when comparing the total in number and in amount for each ratings.

  • This plot compares the the total number of borrowers and total amount of loan for each rating.
  • Although the higher graded borrowers are alomost equal or relatively smaller than lower graded borrowers in number, they are much larger in loan amount.

What are the difference between Completed loans and Defaulted/Charge-off loans?

Next, I would like to explore what factors could affect whether or not a loan is to be defaulted?

Firstly, I would like to look this by loan amount.

  • I was expecting to see more defaulted loans with the higher amount loans as I thought the more loan amount, the more difficult to complete repayment.
  • To my suprise, however, above histograms are showing similar distribution pattern. There does not seem to be much difference in distribution by loan amount.
  • This suggests that loan amount alone is not a strong factor to determine whether or not th eloan is defaulted.

Next, I would like to look into the distribution by monthly payment amount.

## loandata_completed$IsDefaulted: Completed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   94.15  171.10  218.78  297.00 2251.51 
## -------------------------------------------------------- 
## loandata_completed$IsDefaulted: Defaulted/Charge-off
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   107.4   173.7   234.8   307.2  1552.8
  • There does not seem much difference in distribution, too.
  • When looking at the summaries, monthly payment amount of defaulted/charge-off loans is slightly higher than completed loans. For completed loans, median is 171.10 and mean is 218.78 but for defaulted/charge-off loans, median is 173.7 and mean is 234.8.

What can we see if we look this from the borrower rate?

Above plot compares the distribution by borrower rate.

  • First histogram, representing completed loans, showing slight right skewed distiribution with its peak around 0.1.
  • On the other hand, the second histogram, representing defaulted/charge-off, showing left skewed distribution with its peak around 0.3.
  • This is interesting to see that the defaulted loans are more likely with higher borrower rate.

As we have seen earlier, the borrower rate seems to be determined based on risk factors, such as Prosper rating and Prosper scores.

Let’s have a look at the distribution by rating and score to see how it differs between completed loans and defaullted loans.

## loandata_completed$IsDefaulted: Completed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   2.000   3.376   7.000  11.000 
## -------------------------------------------------------- 
## loandata_completed$IsDefaulted: Defaulted/Charge-off
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   2.023   4.000  11.000
  • First set of plots with distribution by Prosper rating, Defaulted/Charge-off distribution seem to be negatively skewed to lower ratings comparing to Completed loans.

** The proportion of AA and A seem to be much smaller for Defaulted/Charge-off loans than Completed loans.

** Lower ratings such as E and HR have larger proportion with defaulted loans.

  • Similar tendency can be seen with the second set of plots by Prosper scores.

** Defaulted/Charge-off loans has its peak around 5-6 whereas Completed loans has around 8.

** Summary shows 3rd quatile of Defaulted/charge-off loans is 7, whereas that of Completed loans is 8.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation.

How did the feature(s) of interest vary with other features in the dataset?

  • There seems to be quite strong correlation between Borrower rate, Stated Monthly Income, Prosper rating, and Prosper score.
  • Risk indicator such as ratings and scores seem to be showing some relation with whether or not the loan is defaulted.
  • It may make sense to interpret that higher loan rate makes it difficult for lower rating borrowers to borrow larger amount, since higher loan rate consequently increases the total repayment amount.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

  • Unfortunately I did not find any intersting relationships between the features other than the main features.

What was the strongest relationship you found?

  • Prosper Score and borrower rate and also Prosper rating and borrower rate are both showing quite strong correlation.

Multivariate Plots Section

I would like to look in the separated plots as below to see how the loan amount differs by Prosper rating.

Seemingly, borrowers with higher rating tend to borrow more amount with lower rate than lower graded borrowers.

Do the borrower with better rating or score repay more?

## loandata_completed$IsDefaulted: Completed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    2917    4417    5324    6583  618548 
## -------------------------------------------------------- 
## loandata_completed$IsDefaulted: Defaulted/Charge-off
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    2500    3750    4451    5417  208333

Finally I would like to see how the defaulted loans are distributed across Prosper rating and Prosper score on below scatter plot.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

It seems Prosper rating and Prosper score have quite strong relation though it is difficult to say if this relation is strengthening each other. However, these 2 variables seems to be correaltes also with borrower rate.

Were there any interesting or surprising interactions between features?

It is not surprising to see there is strong correlation between risk indicators and borrower rate.

OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.

N/A


Final Plots and Summary

Plot One

Description One

For defaulted loans, the borrower rate range is distributed more in higher rates, comparing to the completed loans. This may suggest that the higher the loan rate, the more difficult for borrowers to complete the loan.

Plot Two

Description Two

Ratings and Scores seem to have quite strong correlation with borrower rate. These risk indicators semms to be used as a basis when determining borrower rate.

So, are these risk indicator effective to reduce the risk of default?

Plot Three

Description Three

Seemingly, defaulted/charge-off loans are distributed more in lower ratings such as D, E, and HR. However, defaulted loans still can be seen even with higher ratings like AA or A.

Reflection

Future work:

  • In this report, analysis was conducted for closed loans aiming to find any patterns and any factors that may affect whether or not a loan could go default.
  • As a future work, I am interested in making any prediction on how much portion of current loans could go default.
  • If I could have some more details about this dataset, especially the factors that determine ratings and scores, it might have been possible to find the features that strengthened each other, that possibly help to make such prediction.

Extra: Logistic Regression

After completed the Udacity nanodegree course, revisited this dataset with attempt to establish a logistic regression model.

### prepare a sample dataset
set.seed(20022012)

### add new variable Defaulted by converting IsDefaulted to boolean

loandata_completed$Defaulted <- ifelse(
  loandata_completed$LoanStatus == 'Defaulted' | 
    loandata_completed$LoanStatus == 'Chargedoff', TRUE,
  FALSE)

loandata_completed_train<- subset(
  loandata_completed,
  select = c(LoanOriginalAmountRange, 
             BorrowerRate,
             StatedMonthlyIncomeRange,
             DebtToIncomeRatio, 
             MonthlyLoanPayment,
             ProsperScore,
             ProsperRating..Alpha.,
             CurrentCreditLines,
             EmploymentStatusDuration,
             OpenRevolvingAccounts, 
             OpenRevolvingMonthlyPayment,
             BankcardUtilization,
             RevolvingCreditBalance,
             AvailableBankcardCredit,
             TotalInquiries,
             CurrentDelinquencies,
             Term,
             IncomeVerifiable,
             CreditScoreRangeLower,
             Defaulted
             ))

###split data in to train and test dataset
library(caTools)
set.seed(144)
spl = sample.split(loandata_completed_train$Defaulted, 0.7)
train_set = subset(loandata_completed_train, spl == TRUE)
test_set = subset(loandata_completed_train, spl == FALSE)
### fitting
#model <- glm(Defaulted~.,family=binomial(link='logit'),data=loandata_completed_train)
model <- glm(Defaulted~.,family=binomial(link='logit'),data=train_set)

Interpretation

## 
## Call:
## glm(formula = Defaulted ~ ., family = binomial(link = "logit"), 
##     data = train_set)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7741  -0.8352  -0.5553   1.0165   2.7604  
## 
## Coefficients:
##                                             Estimate Std. Error z value
## (Intercept)                               -3.602e+00  2.001e-01 -18.004
## LoanOriginalAmountRange$1,000 - $1,999     3.077e-01  5.094e-02   6.040
## LoanOriginalAmountRange$2,000 - $2,999     3.577e-01  5.385e-02   6.643
## LoanOriginalAmountRange$3,000 - $3,999     5.096e-01  6.013e-02   8.474
## LoanOriginalAmountRange$4,000 - $4,999     4.807e-01  6.531e-02   7.360
## LoanOriginalAmountRange$5,000 - $5,999     2.813e-01  8.138e-02   3.457
## LoanOriginalAmountRange$6,000 - $6,999     4.540e-01  8.260e-02   5.496
## LoanOriginalAmountRange$7,000 - $7,999     3.947e-01  1.036e-01   3.810
## LoanOriginalAmountRange$8,000 - $8,999     3.788e-01  1.128e-01   3.358
## LoanOriginalAmountRange$9,000 - $9,999     3.155e-01  1.046e-01   3.016
## LoanOriginalAmountRange$10,000 - $10,999   6.821e-02  1.650e-01   0.413
## LoanOriginalAmountRange$11,000 - $11,999   1.200e-01  1.428e-01   0.840
## LoanOriginalAmountRange$12,000 - $12,999  -1.739e-02  1.739e-01  -0.100
## LoanOriginalAmountRange$13,000 - $13,999   1.367e-01  1.884e-01   0.726
## LoanOriginalAmountRange$14,000 - $14,999   1.036e-01  1.422e-01   0.728
## LoanOriginalAmountRange$15,000 - $15,999   8.235e-02  2.461e-01   0.335
## LoanOriginalAmountRange$16,000 - $16,999   1.997e-01  2.443e-01   0.817
## LoanOriginalAmountRange$17,000 - $17,999   2.172e-02  2.469e-01   0.088
## LoanOriginalAmountRange$18,000 - $19,999  -4.086e-01  3.299e-01  -1.238
## LoanOriginalAmountRange$20,000 or more    -4.410e-02  2.035e-01  -0.217
## BorrowerRate                               7.057e+00  2.506e-01  28.157
## StatedMonthlyIncomeRange$1,000 - $1,999   -3.683e-02  4.512e-02  -0.816
## StatedMonthlyIncomeRange$2,000 - $2,999   -3.522e-01  4.671e-02  -7.540
## StatedMonthlyIncomeRange$3,000 - $3,999   -5.268e-01  4.948e-02 -10.646
## StatedMonthlyIncomeRange$4,000 - $4,999   -5.639e-01  5.268e-02 -10.705
## StatedMonthlyIncomeRange$5,000 - $5,999   -6.610e-01  5.941e-02 -11.126
## StatedMonthlyIncomeRange$6,000 - $6,999   -8.260e-01  6.825e-02 -12.103
## StatedMonthlyIncomeRange$7,000 - $7,999   -9.607e-01  7.916e-02 -12.136
## StatedMonthlyIncomeRange$8,000 - $8,999   -9.881e-01  1.008e-01  -9.800
## StatedMonthlyIncomeRange$9,000 - $9,999   -9.920e-01  9.963e-02  -9.957
## StatedMonthlyIncomeRange$10,000 - $10,999 -1.099e+00  1.470e-01  -7.475
## StatedMonthlyIncomeRange$11,000 - $11,999 -9.183e-01  1.351e-01  -6.799
## StatedMonthlyIncomeRange$12,000 - $12,999 -9.698e-01  1.805e-01  -5.374
## StatedMonthlyIncomeRange$13,000 - $13,999 -1.061e+00  2.275e-01  -4.661
## StatedMonthlyIncomeRange$14,000 - $14,999 -1.151e+00  2.156e-01  -5.340
## StatedMonthlyIncomeRange$15,000 - $15,999 -7.864e-01  2.043e-01  -3.849
## StatedMonthlyIncomeRange$16,000 - $16,999 -1.359e+00  3.585e-01  -3.792
## StatedMonthlyIncomeRange$17,000 - $17,999 -1.372e+00  4.047e-01  -3.390
## StatedMonthlyIncomeRange$18,000 - $19,999 -7.237e-01  3.992e-01  -1.813
## StatedMonthlyIncomeRange$20,000 or more   -1.484e+00  1.591e-01  -9.326
## DebtToIncomeRatio                          3.028e-02  1.687e-02   1.795
## MonthlyLoanPayment                         2.220e-03  2.413e-04   9.199
## ProsperScore                               3.452e-02  1.247e-02   2.767
## ProsperRating..Alpha.AA                   -5.345e-01  1.555e-01  -3.437
## ProsperRating..Alpha.B                     2.085e-02  8.785e-02   0.237
## ProsperRating..Alpha.C                    -8.297e-02  8.819e-02  -0.941
## ProsperRating..Alpha.D                    -8.446e-02  8.882e-02  -0.951
## ProsperRating..Alpha.E                    -6.736e-02  1.052e-01  -0.640
## ProsperRating..Alpha.HR                    1.545e-01  1.125e-01   1.373
## ProsperRating..Alpha.NA                    8.383e-01  1.299e-01   6.455
## CurrentCreditLines                         1.164e-02  3.551e-03   3.279
## EmploymentStatusDuration                   4.705e-04  1.582e-04   2.974
## OpenRevolvingAccounts                     -3.930e-02  4.873e-03  -8.066
## OpenRevolvingMonthlyPayment                4.364e-04  5.112e-05   8.538
## BankcardUtilization                       -1.869e-01  3.982e-02  -4.693
## RevolvingCreditBalance                    -1.092e-06  6.418e-07  -1.702
## AvailableBankcardCredit                   -3.054e-06  9.659e-07  -3.162
## TotalInquiries                             3.726e-02  1.758e-03  21.192
## CurrentDelinquencies                       9.294e-02  5.599e-03  16.598
## Term                                       3.077e-02  2.384e-03  12.908
## IncomeVerifiableTrue                      -1.491e-01  4.678e-02  -3.187
## CreditScoreRangeLower                     -1.363e-03  1.361e-04 -10.009
##                                           Pr(>|z|)    
## (Intercept)                                < 2e-16 ***
## LoanOriginalAmountRange$1,000 - $1,999    1.54e-09 ***
## LoanOriginalAmountRange$2,000 - $2,999    3.08e-11 ***
## LoanOriginalAmountRange$3,000 - $3,999     < 2e-16 ***
## LoanOriginalAmountRange$4,000 - $4,999    1.84e-13 ***
## LoanOriginalAmountRange$5,000 - $5,999    0.000547 ***
## LoanOriginalAmountRange$6,000 - $6,999    3.88e-08 ***
## LoanOriginalAmountRange$7,000 - $7,999    0.000139 ***
## LoanOriginalAmountRange$8,000 - $8,999    0.000784 ***
## LoanOriginalAmountRange$9,000 - $9,999    0.002561 ** 
## LoanOriginalAmountRange$10,000 - $10,999  0.679366    
## LoanOriginalAmountRange$11,000 - $11,999  0.400664    
## LoanOriginalAmountRange$12,000 - $12,999  0.920344    
## LoanOriginalAmountRange$13,000 - $13,999  0.467923    
## LoanOriginalAmountRange$14,000 - $14,999  0.466356    
## LoanOriginalAmountRange$15,000 - $15,999  0.737929    
## LoanOriginalAmountRange$16,000 - $16,999  0.413646    
## LoanOriginalAmountRange$17,000 - $17,999  0.929896    
## LoanOriginalAmountRange$18,000 - $19,999  0.215537    
## LoanOriginalAmountRange$20,000 or more    0.828435    
## BorrowerRate                               < 2e-16 ***
## StatedMonthlyIncomeRange$1,000 - $1,999   0.414326    
## StatedMonthlyIncomeRange$2,000 - $2,999   4.72e-14 ***
## StatedMonthlyIncomeRange$3,000 - $3,999    < 2e-16 ***
## StatedMonthlyIncomeRange$4,000 - $4,999    < 2e-16 ***
## StatedMonthlyIncomeRange$5,000 - $5,999    < 2e-16 ***
## StatedMonthlyIncomeRange$6,000 - $6,999    < 2e-16 ***
## StatedMonthlyIncomeRange$7,000 - $7,999    < 2e-16 ***
## StatedMonthlyIncomeRange$8,000 - $8,999    < 2e-16 ***
## StatedMonthlyIncomeRange$9,000 - $9,999    < 2e-16 ***
## StatedMonthlyIncomeRange$10,000 - $10,999 7.74e-14 ***
## StatedMonthlyIncomeRange$11,000 - $11,999 1.06e-11 ***
## StatedMonthlyIncomeRange$12,000 - $12,999 7.70e-08 ***
## StatedMonthlyIncomeRange$13,000 - $13,999 3.14e-06 ***
## StatedMonthlyIncomeRange$14,000 - $14,999 9.30e-08 ***
## StatedMonthlyIncomeRange$15,000 - $15,999 0.000118 ***
## StatedMonthlyIncomeRange$16,000 - $16,999 0.000150 ***
## StatedMonthlyIncomeRange$17,000 - $17,999 0.000700 ***
## StatedMonthlyIncomeRange$18,000 - $19,999 0.069846 .  
## StatedMonthlyIncomeRange$20,000 or more    < 2e-16 ***
## DebtToIncomeRatio                         0.072653 .  
## MonthlyLoanPayment                         < 2e-16 ***
## ProsperScore                              0.005651 ** 
## ProsperRating..Alpha.AA                   0.000589 ***
## ProsperRating..Alpha.B                    0.812395    
## ProsperRating..Alpha.C                    0.346833    
## ProsperRating..Alpha.D                    0.341676    
## ProsperRating..Alpha.E                    0.521984    
## ProsperRating..Alpha.HR                   0.169764    
## ProsperRating..Alpha.NA                   1.08e-10 ***
## CurrentCreditLines                        0.001041 ** 
## EmploymentStatusDuration                  0.002942 ** 
## OpenRevolvingAccounts                     7.28e-16 ***
## OpenRevolvingMonthlyPayment                < 2e-16 ***
## BankcardUtilization                       2.69e-06 ***
## RevolvingCreditBalance                    0.088737 .  
## AvailableBankcardCredit                   0.001568 ** 
## TotalInquiries                             < 2e-16 ***
## CurrentDelinquencies                       < 2e-16 ***
## Term                                       < 2e-16 ***
## IncomeVerifiableTrue                      0.001438 ** 
## CreditScoreRangeLower                      < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 47669  on 38558  degrees of freedom
## Residual deviance: 41058  on 38497  degrees of freedom
## AIC: 41182
## 
## Number of Fisher Scoring iterations: 5
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: Defaulted
## 
## Terms added sequentially (first to last)
## 
## 
##                             Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                        38558      47669              
## LoanOriginalAmountRange     19   124.87     38539      47545 < 2.2e-16 ***
## BorrowerRate                 1  2867.17     38538      44677 < 2.2e-16 ***
## StatedMonthlyIncomeRange    19   645.42     38519      44032 < 2.2e-16 ***
## DebtToIncomeRatio            1    18.69     38518      44013 1.536e-05 ***
## MonthlyLoanPayment           1    11.42     38517      44002 0.0007246 ***
## ProsperScore                 1  1274.22     38516      42728 < 2.2e-16 ***
## ProsperRating..Alpha.        7   396.33     38509      42331 < 2.2e-16 ***
## CurrentCreditLines           1    25.02     38508      42306 5.680e-07 ***
## EmploymentStatusDuration     1     4.50     38507      42302 0.0339317 *  
## OpenRevolvingAccounts        1    13.97     38506      42288 0.0001853 ***
## OpenRevolvingMonthlyPayment  1    46.92     38505      42241 7.382e-12 ***
## BankcardUtilization          1    52.30     38504      42189 4.752e-13 ***
## RevolvingCreditBalance       1     2.54     38503      42186 0.1108529    
## AvailableBankcardCredit      1    24.64     38502      42161 6.908e-07 ***
## TotalInquiries               1   515.54     38501      41646 < 2.2e-16 ***
## CurrentDelinquencies         1   315.59     38500      41330 < 2.2e-16 ***
## Term                         1   170.04     38499      41160 < 2.2e-16 ***
## IncomeVerifiable             1     5.99     38498      41154 0.0143825 *  
## CreditScoreRangeLower        1    96.53     38497      41058 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the results: * BorrowerRate and StatedMonthlyIncome have the lowest p-value of 2e-16. This suggests strong association of between these two variables and the probability of the loan to go default. * EmploymentStatusDuration, IncomeVerifiable, and RevolvingCreditBalance have less correlation.

Memo: * When running anova(), first attempt caused perfect separation warning and LoanOriginalAmount seems to be responsible for this warning. * Applied binning on LoanOriginalAmount, but still throwing the warning. StatedMonthlyIncome also seems to be causing this warning. * After binning both LoanOriginalAmount and StatedMonthlyIncome, the warning is no longer thrown.

Assessing the predictive ability of the model

### predict
fitted.results <- predict(model,newdata=test_set,type='response')
### examine accuracy
threshold = 0.5

fitted.results <- ifelse(fitted.results > threshold,1,0)
misClasificError <- mean(fitted.results != test_set$Defaulted)

print(paste("Misclacific err::",misClasificError))
## [1] "Misclacific err:: 0.275824508320726"
print(paste('Accuracy',1-misClasificError))
## [1] "Accuracy 0.724175491679274"
# compute McFadden R2

print("R2: ")
## [1] "R2: "
pR2(model)
##           llh       llhNull            G2      McFadden          r2ML 
## -2.052888e+04 -2.383474e+04  6.611714e+03  1.386991e-01  1.575745e-01 
##          r2CU 
##  2.220813e-01
### Confusion Matrix
confusion_matrix <- table(test_set$Defaulted, as.numeric(fitted.results >= threshold))

cat("\n")
print("Confusion Matrix: ")
## [1] "Confusion Matrix: "
confusion_matrix
##        
##             0     1
##   FALSE 10427   995
##   TRUE   3563  1540
tn <- confusion_matrix[1]
fn <- confusion_matrix[2]
fp <- confusion_matrix[3]
tp <- confusion_matrix[4]

cat("\n")
print(paste('True Negative: ',tn))
## [1] "True Negative:  10427"
print(paste('False Negative: ',fn))
## [1] "False Negative:  3563"
print(paste('False Positive: ',fp))
## [1] "False Positive:  995"
print(paste('True Positive: ',tp))
## [1] "True Positive:  1540"
print(paste('Recall: ', tp/(tp + fn)))
## [1] "Recall:  0.301783264746228"
print(paste('Precision: ',tp/(tp + fp)))
## [1] "Precision:  0.607495069033531"
# accuracy based on above confusion matrix
print(paste('Accuracy:', (tn + tp)/sum(confusion_matrix)))
## [1] "Accuracy: 0.724175491679274"

We got accuracy of 72.4% Let’s examine the accuracy by conducting coross validation.

Cross Validation

predictors_vector <- c("BorrowerRate", "DebtToIncomeRatio","LoanOriginalAmountRange","StatedMonthlyIncomeRange","MonthlyLoanPayment",
"ProsperScore","ProsperRating..Alpha.",
"CurrentCreditLines","EmploymentStatusDuration","Term",
"BankcardUtilization","RevolvingCreditBalance","AvailableBankcardCredit","TotalInquiries","CurrentDelinquencies"
)

predictors_vector_sum <- paste(predictors_vector, collapse="+")
logReg_model_fmla <- as.formula(paste("Defaulted ~ ", predictors_vector_sum))

fitControl = trainControl( method = "cv", number = 10, p = 0.70 )

### Convert `Defaulted` column from boolean to factor
#train_set$Defaulted <-as.integer(as.logical(train_set$Defaulted))
train_set$Defaulted <-as.factor(as.logical(train_set$Defaulted))

logReg_model_cv <- train(logReg_model_fmla, data=train_set, method = "bayesglm", trControl = fitControl)

logReg_model_cv$resample$Accuracy
##  [1] 0.7160270 0.7203632 0.7321058 0.7249157 0.7282158 0.7266598 0.7282158
##  [8] 0.7282158 0.7334025 0.7260700
mean(logReg_model_cv$resample$Accuracy)
## [1] 0.7264191

Above cross validation results seem to match the predicted accuracy of 72.4%.

ROC and AUC

p <- predict(model, newdata=test_set, type="response")
pr <- prediction(p, test_set$Defaulted)

# Performance function
prf <- performance(pr, measure = "tpr", x.measure = "fpr")

# Plot ROC curve
plot(prf,colorize=TRUE,print.cutoffs.at=seq(0,1,by=0.1), text.adj=c(-0.2,1.7))
grid()

# AUC
auc <- performance(pr, measure = "auc")
auc <- auc@y.values[[1]]
print(paste("AUC:", auc))
## [1] "AUC: 0.734155867332897"
# Recall-Precision curve
prf <- performance(pr, measure = "prec", x.measure = "rec")
plot (prf);

# F1 score
prf <- performance(pr, "f")
plot (prf);

Optimization

We have used threshold of 0.5 in previous analysis and we succeeded in identifying 1,540 true positives, but still missing 3,563 false positives.

Let’s see if changing threshold improve the prediction ability.

fitted.results <- predict(model,newdata=test_set,type='response')

thresholds <- seq(0.1,1.0,by=0.05)

for(threshold in thresholds){
  results.over.threshold <- ifelse(fitted.results > threshold,1,0)
  misClasificError <- mean(results.over.threshold != test_set$Defaulted)
  mean(results.over.threshold)

  confusion_matrix <- table(test_set$Defaulted, as.numeric(results.over.threshold >= threshold))

  print(paste('Threshold: ',threshold))
  print(confusion_matrix)

  print(misClasificError)
  print(paste('Accuracy',1-misClasificError))
  
  tn <- confusion_matrix[1]
  fn <- confusion_matrix[2]
  fp <- confusion_matrix[3]
  tp <- confusion_matrix[4]
  
  print(paste('False Negative: ',fn))
  print(paste('True Positive: ',tp))
  
  recall <- tp/(tp + fn)
  precision <- tp/(tp + fp)
  f1 <- 2* precision * recall /(precision + recall)
  print(paste('Recall: ', recall))
  
  print(paste('Precision: ',precision))
  print(paste('F1: ',f1))
  cat("\n")
  }
## [1] "Threshold:  0.1"
##        
##            0    1
##   FALSE 1786 9636
##   TRUE   117 4986
## [1] 0.5901967
## [1] "Accuracy 0.409803328290469"
## [1] "False Negative:  117"
## [1] "True Positive:  4986"
## [1] "Recall:  0.977072310405644"
## [1] "Precision:  0.340993024210094"
## [1] "F1:  0.505551330798479"
## 
## [1] "Threshold:  0.15"
##        
##            0    1
##   FALSE 3214 8208
##   TRUE   304 4799
## [1] 0.5150983
## [1] "Accuracy 0.484901664145235"
## [1] "False Negative:  304"
## [1] "True Positive:  4799"
## [1] "Recall:  0.940427199686459"
## [1] "Precision:  0.368955177981087"
## [1] "F1:  0.529983434566538"
## 
## [1] "Threshold:  0.2"
##        
##            0    1
##   FALSE 4760 6662
##   TRUE   635 4468
## [1] 0.4415734
## [1] "Accuracy 0.558426626323752"
## [1] "False Negative:  635"
## [1] "True Positive:  4468"
## [1] "Recall:  0.875563394081913"
## [1] "Precision:  0.401437556154537"
## [1] "F1:  0.550483582825109"
## 
## [1] "Threshold:  0.25"
##        
##            0    1
##   FALSE 6163 5259
##   TRUE  1089 4014
## [1] 0.3841452
## [1] "Accuracy 0.615854765506808"
## [1] "False Negative:  1089"
## [1] "True Positive:  4014"
## [1] "Recall:  0.786596119929453"
## [1] "Precision:  0.432869621481721"
## [1] "F1:  0.558430717863105"
## 
## [1] "Threshold:  0.3"
##        
##            0    1
##   FALSE 7402 4020
##   TRUE  1584 3519
## [1] 0.3391225
## [1] "Accuracy 0.660877458396369"
## [1] "False Negative:  1584"
## [1] "True Positive:  3519"
## [1] "Recall:  0.689594356261023"
## [1] "Precision:  0.466772781536013"
## [1] "F1:  0.55671570953963"
## 
## [1] "Threshold:  0.35"
##        
##            0    1
##   FALSE 8459 2963
##   TRUE  2117 2986
## [1] 0.307413
## [1] "Accuracy 0.692586989409985"
## [1] "False Negative:  2117"
## [1] "True Positive:  2986"
## [1] "Recall:  0.5851459925534"
## [1] "Precision:  0.501933097999664"
## [1] "F1:  0.540354686934491"
## 
## [1] "Threshold:  0.4"
##        
##            0    1
##   FALSE 9300 2122
##   TRUE  2625 2478
## [1] 0.2872617
## [1] "Accuracy 0.712738275340393"
## [1] "False Negative:  2625"
## [1] "True Positive:  2478"
## [1] "Recall:  0.48559670781893"
## [1] "Precision:  0.538695652173913"
## [1] "F1:  0.510769864990209"
## 
## [1] "Threshold:  0.45"
##        
##            0    1
##   FALSE 9958 1464
##   TRUE  3125 1978
## [1] 0.2777005
## [1] "Accuracy 0.722299546142209"
## [1] "False Negative:  3125"
## [1] "True Positive:  1978"
## [1] "Recall:  0.387615128355869"
## [1] "Precision:  0.5746658919233"
## [1] "F1:  0.46296079578701"
## 
## [1] "Threshold:  0.5"
##        
##             0     1
##   FALSE 10427   995
##   TRUE   3563  1540
## [1] 0.2758245
## [1] "Accuracy 0.724175491679274"
## [1] "False Negative:  3563"
## [1] "True Positive:  1540"
## [1] "Recall:  0.301783264746228"
## [1] "Precision:  0.607495069033531"
## [1] "F1:  0.403246923278345"
## 
## [1] "Threshold:  0.55"
##        
##             0     1
##   FALSE 10720   702
##   TRUE   3933  1170
## [1] 0.2804841
## [1] "Accuracy 0.719515885022693"
## [1] "False Negative:  3933"
## [1] "True Positive:  1170"
## [1] "Recall:  0.229276895943563"
## [1] "Precision:  0.625"
## [1] "F1:  0.335483870967742"
## 
## [1] "Threshold:  0.6"
##        
##             0     1
##   FALSE 10958   464
##   TRUE   4171   932
## [1] 0.2804841
## [1] "Accuracy 0.719515885022693"
## [1] "False Negative:  4171"
## [1] "True Positive:  932"
## [1] "Recall:  0.182637664119146"
## [1] "Precision:  0.667621776504298"
## [1] "F1:  0.286813355900908"
## 
## [1] "Threshold:  0.65"
##        
##             0     1
##   FALSE 11134   288
##   TRUE   4405   698
## [1] 0.2839939
## [1] "Accuracy 0.716006051437216"
## [1] "False Negative:  4405"
## [1] "True Positive:  698"
## [1] "Recall:  0.136782284930433"
## [1] "Precision:  0.707910750507099"
## [1] "F1:  0.229265889308589"
## 
## [1] "Threshold:  0.7"
##        
##             0     1
##   FALSE 11256   166
##   TRUE   4622   481
## [1] 0.2897428
## [1] "Accuracy 0.710257186081694"
## [1] "False Negative:  4622"
## [1] "True Positive:  481"
## [1] "Recall:  0.0942582794434646"
## [1] "Precision:  0.743431221020093"
## [1] "F1:  0.167304347826087"
## 
## [1] "Threshold:  0.75"
##        
##             0     1
##   FALSE 11327    95
##   TRUE   4760   343
## [1] 0.2937973
## [1] "Accuracy 0.706202723146747"
## [1] "False Negative:  4760"
## [1] "True Positive:  343"
## [1] "Recall:  0.0672153635116598"
## [1] "Precision:  0.78310502283105"
## [1] "F1:  0.1238043674427"
## 
## [1] "Threshold:  0.8"
##        
##             0     1
##   FALSE 11366    56
##   TRUE   4908   195
## [1] 0.3003933
## [1] "Accuracy 0.699606656580938"
## [1] "False Negative:  4908"
## [1] "True Positive:  195"
## [1] "Recall:  0.0382128159905938"
## [1] "Precision:  0.776892430278884"
## [1] "F1:  0.0728427344041838"
## 
## [1] "Threshold:  0.85"
##        
##             0     1
##   FALSE 11390    32
##   TRUE   4991   112
## [1] 0.3039637
## [1] "Accuracy 0.696036308623298"
## [1] "False Negative:  4991"
## [1] "True Positive:  112"
## [1] "Recall:  0.0219478737997256"
## [1] "Precision:  0.777777777777778"
## [1] "F1:  0.0426910615589861"
## 
## [1] "Threshold:  0.9"
##        
##             0     1
##   FALSE 11404    18
##   TRUE   5047    56
## [1] 0.3065053
## [1] "Accuracy 0.693494704992436"
## [1] "False Negative:  5047"
## [1] "True Positive:  56"
## [1] "Recall:  0.0109739368998628"
## [1] "Precision:  0.756756756756757"
## [1] "F1:  0.0216341510527332"
## 
## [1] "Threshold:  0.95"
##        
##             0     1
##   FALSE 11416     6
##   TRUE   5086    17
## [1] 0.3081392
## [1] "Accuracy 0.691860816944024"
## [1] "False Negative:  5086"
## [1] "True Positive:  17"
## [1] "Recall:  0.00333137370174407"
## [1] "Precision:  0.739130434782609"
## [1] "F1:  0.00663285212641436"
## 
## [1] "Threshold:  1"
##        
##             0
##   FALSE 11422
##   TRUE   5103
## [1] 0.3088048
## [1] "Accuracy 0.691195158850227"
## [1] "False Negative:  5103"
## [1] "True Positive:  NA"
## [1] "Recall:  NA"
## [1] "Precision:  NA"
## [1] "F1:  NA"

Threshold of 0.4 seems to score most balanced result, we have F1 score of 0.51 that is closest to 0.5, precision higher than 0.5, and also accuracy higher than 71%.